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1.  INTRODUCTION 


As  a  decision  maker,  a  person  or  a  machine  bases  decisions  on  information  received  from 
the  environment.  Often,  information  pertinent  to  a  specific  decision  comes  from  many  sources. 
The  purpose  of  this  report  is  to  develop  the  formulas  to  be  used  for  linearly  combining  estimates. 
It  is  conceived  that  the  estimates  are  being  generated  by  some  form  of  measurement  where  the 
uncertainty  can  be  described  as  a  Gaussian  function.  Examples  of  this  type  of  measurement  can 
include  sensor  systems,  experimentation,  or  value  judgements.  To  use  these  methods,  the 
uncertainty  associated  with  each  source  is  assumed  to  be  known.  This  report  starts  with  the 
simplest  situation  and  then  looks  at  increasingly  complicated  fusion  problems. 

As  an  example,  consider  an  active  protection  system  for  a  tank.  The  tank  may  have  several 
sensor  systems  that  estimate  properties  of  an  incoming  projectile.  Interferometers,  range 
sensors,  and  velocity  sensors  can  be  combined  to  give  several  independent  estimates  of  a 
projectile's  position  at  a  specific  time.  Each  of  these  estimates  should  be  combined  into  a  single 
improved  estimate  of  position.  This  improved  position  estimate  is  then  used  to  estimate  the 
projectile’s  trajectory. 

Least-squares  estimation  selects  the  vsdues  of  the  parameters  in  a  mathematical  model  that 
minimize  the  squared  differences  between  the  mathematical  model  and  a  set  of  observations. 
Data  fusion  methods  are  based  on  a  priori  knowledge  of  a  source’s  measurement  error  and  result 
in  a  weighted  average  of  the  observations.  Three  general  data  processing  approaches  to 
estimation  are  "en  bloc,"  recursive,  and  iterative  estimation.  The  iterative  approach  calls  for 
multiple  passes  through  the  data  and  will  not  be  discussed  in  this  paper.  In  the  "en  bloc"  method, 
all  the  data  is  processed  at  once  to  calculate  the  estimate.  When  the  weights  are  calculated  "en 
bloc,"  each  weight  indicates  the  relative  value  of  the  source.  The  recursive  method,  where  the 
data  is  processed  one  observation  at  a  time,  develops  the  combination  rule  used  in  Kalman 
filtering.  Although  there  is  a  statistical  problem  with  the  recursive  method  in  the  initialization  of 
the  process,  the  recursive  formulation  of  the  least-squares  solution  seems  to  be  preferred  in  many 
fields,  including  electronics,  economics,  and  biology.  The  recursive  least-squares  method  has 
several  pragmatic  advantages  over  the  "en  bloc"  method  in  real  time  cases.  These  include: 
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1 .  An  estimate  and  its  error  distribution  are  always  available. 

2.  A  decision  can  be  made  based  on  the  current  error  distribution. 

3.  It  can  provide  more  insight  to  the  actual  problem. 

4.  It  can  be  modified  into  different  approximation  techniques  when  the  underlying 
least-squares  assumption  are  compromised. 

In  recursive  estimation,  the  estimate  is  updated  each  time  an  observation  becomes  available. 
The  weight  associated  with  the  observation  indicates  the  value  of  the  observation  in  relation  to 
the  value  of  the  current  estimate.  The  change  in  the  estimate  as  a  result  of  the  update  is  called 
the  gradient. 

A  fundamental  process  in  data  fusion  is  to  find  a  representation  of  the  target,  or  unknown 
system,  so  that  updates  based  on  new  information  depend  only  on  the  current  estimate  and  the 
new  data.  This  characteristic  is  referred  to  as  the  Markov  property.  In  some  situations,  a  list  of 
resources,  a  deployment  pattern,  some  doctrinal  procedures,  and  a  location  may  constitute  a 
Markov  representation.  When  the  new  information  becomes  available,  the  estimate  is  changed 
by  a  gradient  that  reflects  the  uncertainty  associated  with  both  the  estimate  and  new  observation. 
The  ideas  discussed  herein  can  be  used  to  determine  the  proper  gradient  when  the  uncertainty 
is  known. 

Consider  the  intelligence  problem  of  a  command  center.  Information  must  be  combined  and 
processed  to  identify  enemy  locations,  type  of  units,  identity  of  units,  and  the  intentions  or  orders 
of  the  unit.  Sources  of  information  are  reports  from  imagery  intelligence,  signal  intelligence,  and 
human  intelligence.  The  value  of  each  report  depends  on  its  source  and  its  timeliness.  As  new 
information  comes  in,  the  current  target  estimate  must  be  updated.  Each  report  can  be  thought 
of  as  producing  a  gradient.  When  mathematical  models  of  the  target  are  available,  it  is  possible 
to  develop  automated  methods  resulting  in  the  best  update.  Even  when  automated  methods  are 
not  possible,  it  is  usually  desirable  to  find  a  Markov  representation  and  use  this  type  of  reasoning. 
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The  report  progresses  on  a  case  by  case  basis.  The  first  seven  cases  deal  with  "en  bloc’ 
procedures  while  the  last  two  methods  are  recursive.  Case  One  considers  the  problem  of 
estimating  a  single  parameter  from  two  noisy  estimates.  The  basic  method  for  solving  the 
problem  is  demonstrated,  and  the  formula  for  combining  information  is  given.  Case  Two 
discusses  the  modifications  to  the  basic  formula  if  the  uncertainty  is  a  function  of  some  variable 
(time  or  range  for  example).  Case  Three  and  Case  Four  introduce  correlation  between 
observations  to  the  first  two  cases.  Cases  Five  and  Six  extend  the  ideas  to  three  observations 
and  present  the  solution  in  the  form  of  the  general  solution.  Case  Seven  presents  the  general 
’en  bloc’  solution  as  a  summary  of  the  previous  situations.  Case  Eight  presents  the  recursive 
method  for  solving  this  type  of  problem.  Case  Nine  introduces  the  recursive  method  for 
combining  vector  estimates. 

2.  CASE  ONE 

The  method  described  here  is  for  combining  two  different  uncorrelated  pieces  of  information 
from  different  sources.  The  quantity  to  be  estimated  is  X;  the  goal  is  to  find  the  form  of  the 

estimator  ^ . 

2.1  Problem.  Find  the  best  way  to  linearly  combine  two  observations,  Z,  and  Zg,  if 

Z,  =  X  +  V,.  V,  -  N(0.  a^). 

Zj  *  X  +  Vj.  Vj  -  N(0,  0*2),  and 

E(V,  V,)  =  0. 

The  estimator  will  have  the  form  X  =  Iq  Z,  +  k^Z^ . 

2.2  Solution.  If  the  estimator  is  to  be  unbiased,  then 

E(X-X)  -  0 


3 


or 


E(k,Z,  +k,Z2-X)-0 
E(k,X  +  +  kjX  +  kjVj  -  X)  =  0 

k^X  +  kjX  -  X  +  k,E(V,)  +  k2E(V2)  =  0 

k,  +  kj  -  1  =  0 
k,  =  1  -  kj. 

Since  k,  +  kg  =  "I  ,  we  can  simplify  the  notation  by  letting  k,  =  k  and  kg  =  1  -  k.  After  doing  this, 
the  form  of  the  estimator  is 

X  =  kZ,  +  (1  -  k)Zg. 

The  variance  of  this  estimator  is  found  as  follows: 

E(X  -  X)2  =  E(kZ,  +  (1  -  k)Z2  -  Xy 

=  E(kX  +  kV,  +  (1  -  k)X  +  (1  -  k)Vg  -  X)* 

«  E[(k  +  (1  -  k)  -  1)X  +  kV,  +  (1  -  k)Vg)" 

=  E(kV,  +  (1  -  k)Vg)* 

=  E(k2V"i  +  (1  -  k)''V*2  +  2k(1  -  k)V,Vg) 

=  +  (1  -  kya^  *  0. 

To  find  the  minimum  variance  estimator,  take  the  derivative  of  the  variance  with  respect  to  k,  set 
this  expression  equal  to  zero  and  solve  for  k. 

—  E(X  -  xy  -  2ka^  -  2(1  -  k)o"2  -  0 
3k 

ka^i  +  ko^j  * 

k._fj _ 

o^  +  o% 

Finding  the  second  derivative  verifies  that  this  value  of  k  is  a  minimum. 
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The  form  of  the  estimator  is 


02 

Y _ 7  + 

A.  2  2 

a  1  +  c  2 

The  variance  of  the  estimator  can  then  be  found. 


2  _2 

O  1  >0  2 


Var(X)  = 


,  1 

2 

2 

CM 

0 

Var  (Z,)  + 

c, 

—2  2 

0  1  +02 

-2  2 

0  ^  +  0  2 

2  2 

Var(4)  *  2  E(Z,Z,) 

(0*1  ♦  o%)' 


4  2  4  2 

02^1  ^1^2 


(o  1  +  O  2)  (O  1  +  O  2) 


•2  \2 


o^o^(o^  -H  o^) 
(o^  + 


2  2 
o  1  o  2 

O  1  +  O  2 


To  summarize,  the  estimate  is 


_2 

O  ^  ^  -r  ^  ’  -» 

X  =  — 2  ^  ~2  2  ^ 

O1+O2  O1+O2 


with  a  variance  of 


O  1O  2 


2  _2 

o  1  +  o  2 


3.  CASE  TWO 

In  this  section,  the  results  of  the  previous  case  are  extended  to  consider  the  situation  where 
the  uncertainty  has  a  functionai  form.  For  example,  the  uncertainty  of  a  measurement  may  be 
a  function  of  the  magnitude  of  the  measurement.  This  is  the  case  for  radars  measuring  range 
and  for  value  judgements.  Also,  the  uncertainty  associated  with  some  information  may  increase 


over  time.  This  happens  in  Kalman  filtering  when  the  current  estimate  is  propagated  fonvard  in 
time  as  a  prediction  based  on  the  state  model. 


If  the  variance  of  the  estimator  is  a  function  of  a  variable,  then  the  results  of  Case  One  can 
be  rewritten  as  follows. 


Assume 


Then, 


and 


Z,  =  X  +  V,. 

V,  -  N(0,  f(R)), 

II 

X 

+ 

< 

ro 

Vj  -  N(0,  f(t)),  and 

E(V,  V,)  =0. 

^(t)  .  ^(R) 

^  “  f(t)  +  f(R)  ^  f(t)  +  f(R)  ^2 


Var  (X)  = 


f(t)  f(R) 
f(t)  +  f(R) 


Consider  the  following  applications: 
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1 

f(R) 


•  Example  1 . 


Find  the  properties  of  the  best  estimator  for  two  sensors  if  one  has  a  constant 
standard  deviation  of  1  m  and  the  other  has  a  standard  deviation  of  .05R  where  R  is  the  range 
in  meters;  i.e.. 


V,  -  N(0,  12) 

Vj,  -  N(0,  (.05R)2) . 


By  the  above  results. 


(.05R)2 


Var  (X)  = 


1  +  (.05R)2 
(.05R)2 


Zi  " 


1 


1  +  (.05R2)  ® 
1 


Z  .  and 


1  +  (.05R)2  ■  .,  ^  1 

(.05R)2 
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It  can  be  seen  that  when  R  is  large  the  variance  approaches  one  and  when  R  is  small  then  it 
goes  to  (.05R)^  (and  then  to  zero  as  R  goes  to  zero). 

•  Example  2. 

Find  the  properties  of  an  estimator  that  combines  information  whose  variance  increases 
exponentially  with  time;  i.e., 

V,  -  N(0,e'io^) 
and 

Vj  -  N(0.e'2a\). 

If  we  assume  the  estimates  were  made  at  t^  and  t2  time  units  in  the  past,  then 

e  '2  e  ’’  a\ 

e  ’’  0^1  +  e  '2  0^2  ^  e  ’’  +  e  *2  ^ 


and 


Var(X)  = 


e'2a\e'i  o\ 
e'ia^  +  e'2o^2 


e'2*'io^o\ 
e  +  e  '2  a\ 


In  many  situations,  the  results  of  Case  One  can  be  extended  to  find  a  quantitative  data-fusion 
technique.  The  variations  will  depend  on  specific  knowledge  of  the  situation. 

4.  CASE  THREE 

Case  Three  extends  Case  One  to  consider  the  effects  of  correlated  errors.  Sometimes  the 
information  sources  are  not  independent  and  the  errors  associated  with  each  contain  some 
common  components.  If  we  know  the  amount  of  association,  the  form  of  the  estimator  can  be 
derived. 
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For  two  variables  with  correlated  noise,  the  assumptions  are: 


Z,  =X  +  V,.  V, -N(O.o^). 

Z2  =  X  +  V2.  Vj  -  N(0,o\),  and 

E(V,  y,)=pa,a,. 

As  in  Case  One 

X  =  kZ,  +  (1  -k)Z2. 

Under  the  assumptions,  the  variance  of  the  estimator  is 

E(X  -  X)2  =  E  (k^  V"  +  (1  -  k)*  V\  +  2k  (1  -  k)  V,  V^) 
=  k2a^  +  (1  -  k)*o%  +  2k  (1  -  k)  po^Og. 


The  minimum  variance  estimator  is  found  as  follows. 

Ae(X  -  X)2  =  2ko^  -  2(1  -  k)0^  +  2pai  02  -  4k p 0,02 
3k 

=  (20^  -  4p0,02  +20%)k  -  2o\  +  2p0,02. 

After  setting  the  partial  equal  to  zero  and  solving  for  k,  we  get 

o  2  -  p  o,  O2 

k  ® - - — - — 

o^  +0%  -  2  p  0, 02 

Putting  this  value  into  the  above  variance  expression,  we  have  the  following  steps  to  arrive 
at  a  simplified  variance  expression. 
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Var  (X)  =  k^a^  +  (1  -  k)®o%  +  2k(1  -  k)pa,  Oj 

^  -po^  *  (q^  - p  q,  02)^0%  2(q%  -  p  Qi  Qz)  (q^  -  p  Oi  <^2)  p  <^1  ^2 

(a^  +  o%-2po,02)'' 

^  o"i  q%  (0^2  -  2  p  o,  Og  +  p"  q^ )  -^  q^  q%  (o^  -  2  p  o,  Og  p=^  p%) 

(o^  +o%-2po,a2)2 

2  p  o,  02  (0^1 0^2  -  p  0^1 02  - p  o,  0^2  +  p®  0*1  0^2) 

(o"i  +  o%-2po,02)^ 


0^1 0^2 [0^1  +0%  +  p^o^i  +  p^o% -  4po,  02  +  2pa,  O2 -2p*o^i  -  2po^2  +2p^o,  O2] 

(o^  +  o%-2po,02)* 

o^o^[o^  +  o%-p^o,  -p=^02-2po,02  2p=^o,02] 

(o^  +  o%-2po,02)® 

o^  o\  (1  -  p^)  (o^  0%  -  2  p  o,  02) 

(o^  +  o%-2po,02)“' 


q'iq%(i  -p^) 

o^  +  o%-2po,  O2 


In  summary,  the  estimator  has  the  form 


X  = 


o  2  -  p  o,  O2 


o  2  -  2  p  o,  o. 


•Zi  " 


o  1  p  O,  O2 


o^  +  o%  -  2  p  o,  Og  * 


with  variance 


Var  (X)  = 


o^o%(1-p^) 

o^  +  0%-2p0i02 
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5.  CASE  FOUR 


This  case  presents  the  most  general  form  of  an  estimator  for  combining  two  pieces  of 
information.  Case  Three  is  extended  to  consider  the  case  when  the  uncertainty  associated  with 
correlated  information  has  a  functional  form.  The  variance  is  expressed  as  a  function  of  a 
variable  such  as  time,  range,  or  degrees  from  boresight.  With  correlated  observations,  the  fonn 
is  similar  to  Case  Two. 

Assume 


Z,  =X+V,,  V,  -  N(0,f(r)), 

Z2  =  X  +  V2.  V,  -  N(0.g(t)),  and 

E(V,V,)  =  h(r,t). 


When  combining  two  pieces  of  information,  the  most  general  form  of  the  estimator  is 

.  g(t)-h(r,t)  ^  f(r)-h(r,t) 

=  f(r)  +g(t)-2h(r.t)^^  f(r)  +  g(t) -2h(r,t)"^2 


with  a  variance  of 


Var  (X)  = 


f(r)g(t) 

'i  _  h(r.t)^  ■ 

L  g(t)f(r)J 

f(r)  ^g(t)-2h(r.t) 


6.  CASE  FIVE 

In  this  case,  three  data  points  with  uncorrelated  measurement  errors  are  combined  as  a  single 
estimate.  This  extends  Case  One  so  that  three  pieces  of  information  can  be  processed  at  one 
time.  The  assumptions  for  i  =  {1 ,2,3}  are 
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Z,  »X+V,.  V, -N(0.o"0. 

E(V,V,)  =Oifl^j. 

Using  the  same  reasoning  as  in  Case  One,  it  can  be  shown  that  1  =  k,  +  kg  +  kg .  Note  that 
kg  =  1  -  k,  -  kg:  thus,  kg  can  be  eliminated.  The  form  of  the  estimator  isX=k,Z,  +  kgZg  + 
(1  -  k^  -  kg)  Zg.  To  find  the  values  of  k,  and  kg  which  minimize  the  variance,  the  partial 
derivatives  of  Var  (X)  are  found  and  set  equal  to  zero.  This  set  of  equations  is  then  solved  for 
k^  and  kg .  The  following  equations  realize  these  steps. 

Var  (X)  -  k",o^  +  +  (1  -  k,  - 

^  (Var  (X))  .  2k,  o^  -  2(1  -  k,  -  k,)a\ 

^  (Var(X))  »  2k, o%  -  2(1  -  k,  -  k,)o% 

Setting  the  partials  equal  to  zero,  we  have  the  following  two  equations: 

(o*i  +  o%)k,  +  o%k,  =  o%,  and 
o%k,  +  (a%  +  o%)k2  -  o%. 


Using  matrix  notation,  these  equations  can  be  written  as 


2  2  2 

- 

- 

O  1  +  O  3  ©3 

_2  2  2 

O  3  03+03 

.^2_ 

.®3. 

The  above  matrix  equation  can  be  rewritten  as 
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The  variance  of  the  estimator  is 

Var(X)  =  k^,a^  + 

O2O3O1  +0i0302  +O1O2O3 

(0^CJ^+0^0^+0^0^)=' 

o^o%o^ 

2  2  2  2  2  2 
O1O2  '*■^1^3  ”^^2^3 

1 


The  variance  of  the  estimator  has  the  same  form  as  the  resistance  of  a  parallel  circuit.  From  an 
electrical  engineering  perspective,  the  ideas  could  be  expressed  as  resistance  to  the  flow  of 
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information.  Indeed,  a  circuit  of  parallel  transistors  would  be  able  to  simulate  this  situation.  The 
gate  voltage  would  be  set  to  create  the  proper  resistance  and  the  current  through  the  circuit 
would  be  the  variance  of  the  estimate.  For  N  independent  measurements,  this  can  be 
generalized  to 


K 


i 


£  n  o\ 

i«1  kill 


in  general,  one  would  have  the  estimate 


X  = 


N 


£  k,  Z 


I 


with  a  variance  of 


Var  (X)  =  -^rV 

£  _L 

i.i  _2 


In  summary,  for  three  observations  the  form  of  the  estimator  is 


.  OjOs  G1O3 

X _ 7  +  _ 

''”22  22  22  22  22  .. 


z  + 

2  _2  ^2  _2  _2 


O  1  O  2 

~  2  _2  2  ^3 

O1G2  ‘*‘^2^3 


The  variance  of  X  is 


Var  (X)  = 


/T^ 

O  1  O  2  o  y 


_2  2  2  2  _2  _2 
o^Oa  +  O1O3  +  0303 
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7.  CASE  SIX 


This  case  extends  Case  Five  by  including  the  effects  of  correlated  noise.  The  matrix 
representation  used  in  this  and  the  previous  case  is  needed  for  the  general  case  presented  in  the 
next  section.  The  assumptions  for  i  =  {1,2,3}  are  represented  as  follows 

Z,  =  X  +  V„  V, -N(0,o^), 

E(ViVj)  =  PijO.i  if  ‘  j  • 

As  in  Case  Five,  1  =  k,  +  kg  +  kg .  Due  to  the  correlation  between  observations,  the  variance 
expression  is  more  complicated.  It  is 

Var  (X)  =  k^10^  +  kgO,  Oj  +  2p,3k,  kgO,  O3 

2p23  kg  k3  02  03 

=  k"a"i  +  k%a%  +  (1  -k, -k2)*a^  +  2p,gk,  kgO^Og 

2p,3  k^  (1  -  k,  -  kg)  O3  +  Zpgg  kg  (1  -  k^  -  kg)  Og  03  . 

To  find  the  minimum  variance,  the  partial  derivatives  are  set  equal  to  zero  and  then  the  resulting 
matrix  equation  is  solved. 

Var  (X)  =  2o^  k,  -2(1  -k,  -kg)a^  +  2p,gO,  Ogkg 

oK^ 

Zp^3  Og(1  ~  2k,  -  kg)  -  2p23  Og  O3  kg 
=  2(a  3  +  p,gO,  Og  -PigO,  Og  "PgaOgOg)  kg 

+  2(o^  +  o*3-2p,3  0,03)k, 

-  2(0*3 -p, 3  0,03) 

-  0 
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Similariy, 


2(03  +  P^3  0^G3  p23  02  03)k, 

+  2(0^2  +  0^-2p23  02  03)k2 

-  2(0^3 -P23  <52^3) 

0 

In  matrix  representation,  this  is 

K 
K 

®  3  “Pl3®1 
®  3  ~  P23®2®3 


O  1  +  O  3  2p^3  Gg 

O3  +  Pl2®1  ®2  ~Pl3®1  ®3  “■P23®2®3 


®  3  Pl2®1  ®2  Pl3®1  ®3  ~P23®2®3 

G  2  +  O  3  “2P23G2G3 


3?-  Var  ()*)  = 
3  kg 


The  above  can  be  solved  by  using  Cramer’s  rule,  or  some  other  matrix  algebra  technique. 

8.  CASE  SEVEN 

The  general  case  for  finding  the  optimal  set  of  weights  {kj  is  developed  in  this  section.  This 
is  the  last  "en  bloc"  method  discussed.  The  results  are  derived  by  finding  the  partial  of  the 
variance  expression  with  respect  to  one  variable  (k,)  then,  after  setting  this  expression  equal  to 
zero  and  solving  it,  the  proper  matrix  equation  is  shown.  The  approach  taken  is  to  break  the 
expression  into  a  series  of  terms.  Each  term  is  examined  in  sequence  then  these  results  are 
combined  into  the  partial  of  the  variance  with  respect  to  the  selected  variable. 

The  variance  of  the  estimator  for  the  general  case  is 

Var  (X)  =  E  G,*k, ^  +  E  E  2p,,G,G,k|k.  .  m\ 

1-1  1-1  j»i  '  ' 


15 


Note  that 


k 


n 


n  -1 

1  -  E  k, 

m  =  1 


To  get  the  partial  of  the  variance  with  respect  to  the  ky,  0  <  /  <  r^1 ,  the  partial  of  the  first  term  of 
Equation  (1)  is  found  and  then  the  partial  of  the  second  term.  For  the  first  term,  we  have 


4-  E  o,’k, '  -  4-  E  o,’k,  ’  »  4-  a.Ml  -  E  Kf 

m 


3k,  i> 


3k,  1=1  3k, 

=  2o,^k,-2a„"  (1-"e  kj 


=  2o,"k,-2o„^+20n''k,  +  2  "e  o„"k,  . 

\*i 


(2) 


The  second  term  is  more  complicated  than  the  first  term. 


Ae  E  2p,jOiOjk,kj  = 

3k,  I-1  j>i  >  ‘  > 

4-  ^  ^  2p,ja,o,k,k,  +  ^  E  2p,„a„a,k,(1  -  E  k„) 

</K^  1*1  J>l  uKf  1*1  ID 


Each  of  the  above  terms  will  be  considered  (and  referred  to  as  the  third  and  fourth  term).  In  the 
third  term,  note  that  i  j ;  so,  if  the  partial  term  is  nonzero,  either  i  =  /  or  j  =  /.  For  example,  let 
n  =  5  and  1=3,  then  if  i  =  1  and  j  =  3,  a  nonzero  partial  term  is  obtained;  likewise,  when  i  =  2  and 
j  =  3,  and  when  i  =  3  and  j  =  4,  nonzero  partial  terms  are  obtained.  The  partial  of  the  third  term 
can  be  written  as  follows; 


w,  I  % 


n  -1 

E 

j*/*1 

/-I 


2p,jO,Ojkj,  if  i=/ 


E  2p„a|0,k„  if  j=/ 
1*1 
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The  next  task  is  to  collect  the  various  expressions  (2,  3,  4,  5)  that  make  up  the  partial 
derivative  and  then  to  organize  them  in  some  meaningful  format.  There  will  be  three  groupings: 
terms  containing  k,,  terms  containing  no  variables,  and  terms  containing  variables  other  than 
k,. 

Terms  associated  with  k,  are 

2o,^k,  +  2o„''k,  -  4p^o,o„k,. 

Terms  that  do  not  contain  a  variable  are 
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-2a„*  +  2p,„a,o„  . 

Terms  associated  with  a  variable  kj,  where  i  ^  I  are 

n-1  n-1  n-1  n-1 

2  E  On"k,  +  2  E  p„o,o,k,  -  2  E  py„o„o,/c,  -  2  E  p,„o,o„k,  . 

i*i  \*i  I*/  \*i 

The  above  represents  the  partial  derivative  of  one  variable:  by  doing  this  for  each  variable,  setting 
the  result  equal  to  zero,  and  dividing  by  two  we  can  formulate  a  matrix  equation  for  the  general 
case. 

The  indices  i,  j  will  be  used  in  place  of  /  and  i.  The  form  we  wish  is 

AK  =  C 

where  K  is  the  variable  vector  of  length  n  -  1 ,  C  a  vector  of  constant  terms  of  length  n  -  1 ,  and 
A  the  coefficient  matrix  of  size  (n  -  1)  x  (n  -  1)  of  the  vector  K. 

The  diagonal  terms  of  A,  (aj|),  will  be 

a,!  =  0|  +  CTn  -  2p,„  Oj  cy„ , 

and  the  off  diagonal  terms  of  A,  (ajj),  where  i  *  j  will  be 

3||  *  On  +  P||0,Oj  -  pjnOjOn  “  PmOiOn. 

The  elements  of  the  vector  C,  (C|),  are 

Cl  =  On"  -  P,nO,On. 
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The  values  of  K  can  be  found  using  matrix  algebra  (  K  =  A  ’C).  As  with  the  two  observation 
cases,  functions  can  be  substituted  for  the  variance  terms  to  produce  classes  of  variable 
estimates. 

The  matrix  A  above  can  also  be  generated  using  the  covariance  matrix  by  the  following 
method: 

Let  E  be  the  covariance  matrix  with 

If  we  generate  the  following  matrices 


E  = 


(n  -1)xr 

10  0. 
0  10. 
0  0  1. 
D'  =  .  .  .  . 

-1  -1  -1  . 

then  A  =  D  E  D' . 
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9.  CASE  EIGHT 


In  contrast  to  "en  bloc"  procedures,  recursive  methods  process  the  data  one  piece  at  a  time 
and  keep  updating  the  estimate.  As  an  example,  assume  unassociated  observations  and  7^ 
become  available. 

Assume  one  piece  of  data  is  available,  our  first  estimate  will  be  X, .  The  process  is  initialized  so 
that 


X,  =  Z,.  Var  (X,)  =  o^. 


If  observation  becomes  available,  then 


Y - Y  +  _ 

^2  “  _2  _2  _2 


a  1  +  o 


o ,  +  o 


2  ^2  ' 


With  Var  (X,)  =  . 

o  1  +  o  2 

Assume  Z3  becomes  available,  then 

.  ^  Var(X2) 

"  Var  (Xg)  +  0^3  ^  Var  (X^)  +  ® 


with 


Var  (X3)  = 


Var  (X2)o='3 
Var  (X^)  +  ' 


It  can  be  verified  that  this  result  agrees  with  Case  Five. 
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Note  that  the  number  of  steps  used  in  the  recursive  soiution  is  a  linear  function  of  the  number 
of  data  points.  (To  update  the  estimate  and  its  variance  takes  two  additions,  three  multiplications, 
and  two  divisions.)  The  recursive  method  is  usually  preferred  for  real  time  applications  and  in 
control  systems. 


Recursive  techniques  express  the  estimate  as  a  weighted  average  of  the  old  estimate  and  the 
new  observation.  In  many  formulations,  this  is  expressed  as  the  old  estimate  plus  the  change 
as  a  result  of  the  new  information.  The  recursive  estimator  is: 


= 


2 

^  1*1 


Var  (X,)  +  aV, 


X. 


Var  (X,) 
Var  (X,)  + 


which  can  be  rewritten  as: 


Var(X,) 

Xut  =  X.  .  (Z,.,  -  X.) . 


Var  (X,)  +  o'u, 


The  second  term  represents  the  gradient  due  to  the  (i+1)st  observation. 


10.  CASE  NINE 


To  extend  the  idea  of  a  recursive  estimator  to  the  vector  situation  is  straightf onward.  Let 
X  and  Zj  represent  vectors  and  replace  the  variance  terms  with  the  covariance  matrices  Ej  that 
represent  the  uncertainties  associated  with  each  vector  to  be  estimated.  Note  X  X'  is 
represented  by  X^  and  K  is  a  matrix.  The  separate  observations  are  assumed  to  be  uncorrelated. 

Z,  =  X^V,,  V, -N(0,E,), 

Z,  =  X  +  Vj,  and  Vj  -  N(0,E2). 

Using  the  same  method  as  Case  One,  but  with  vectors 

E(K,Z,  +  K2Z2)  =  E[K,X  +  K,  V,  +  K2X  +  K2V2] 

=  K,  X  +  K2  X 
=  (K,  +K2)X. 
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If  the  estimator  is  to  be  unbiased,  then 

[K,  +  KJ  X  =  X. 

Thus,  K,  +  Kg  =  I.  The  next  example  shows  a  method  of  solution  when  the  two  covariance 
structures  are  diagonai. 

•  Example  3. 


When  and  Eg  are  diagonal,  we  have  a  problem  that  can  be  decoupled  into  two  problems 
similar  to  Case  One.  Consider  finding  the  proper  weights  for  two  vector  observations  with  the 
following  error  structure.  Let 


5:, 


5  0 
P  4 


,  and 


Both  error  structures  are  diagonal  so  the  matrix  problem  will  be  broken  into  two  separate  scalar 
estimation  problems  (as  in  Case  One).  In  the  first  problem,  a,  =  5  and  Og  =  3 ,  yielding  weights 
of  3/8  and  5/8.  In  the  second,  o,  =  4  and  Og  =  6  giving  weights  of  .6  and  .4  to  the  observed 
value  of  the  second  variable.  In  matrix  notation,  the  above  process  for  finding  the  value  to 
associate  with  the  second  vector  observation  can  be  written  as  follows: 


5  0 

8  0 

o 

P  10. 

5  0 
0  4 


125  0 
0  .1 


625  0 
0  .4 
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This  example  demonstrates  that  new  concepts  are  not  needed  when  the  covariance  structure  is 
diagonal. 

The  following  observations  simplify  the  calculations  for  the  nondiagonal  case.  There  is  a 
theorem  that  states  for  any  two  covariance  structures  there  is  a  basis  in  which  they  both  are 
diagonal  (Dempster  1969;  Fukunaga  1972).  Assume  the  appropriate  change  of  basis  is  made. 
Then,  using  the  method  of  Example  Three,  the  appropriate  weights  can  be  found.  A  change  of 
basis  back  to  the  original  coordinate  system  will  result  in  a  matrix  of  full  rank.  We  proceed  to  find 
a  minimum  variance  estimator  as  follows: 

e[(K,Z,  .  -  X)2]  =  e[{K,  V,  +  K,V,)==]  . 


Since  the  different  observations  are  uncorrelated,  this  can  be  written  as 

=  e[(K,V,)*)  .  e[(K,V,)>] 

=  K,  E,  K',  +  KjEjKV 


In  the  above,  Kg  can  be  expressed  in  terms  of  K, 

=  K,  E,K’,  .(l-K,)Eg(l-K,)*. 


The  error  in  this  case  is  represented  by  a  covariance  matrix.  Minimizing  the  trace  of  this 
covariance  matrix  will  minimize  the  total  estimation  error.  The  rules  for  manipulating  the  trace  of 
a  matrix  are  discussed  by  Athens  (1965).  From  his  summary,  the  rule 

8trace(AXBX*)  ^  +  aXB 

ax 


is  used  letting  A  be  the  identity  matrix,  B  be  the  covariance  matrix,  and  X  be  K, .  The  minimum 
occurs  where 
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thus, 


and 

K^  =E,(E,  ^Ej)-’. 


The  general  vector  solution  can  be  written  as  a  recursive  estimator 
X,,,  =K,X, 

=  X,  .E,,  (E,,  -X,). 


This  should  be  interpreted  as  follows:  the  new  estimate  is  the  old  estimate  plus  the  product  of 
the  value  of  the  new  information,  and  the  distance  between  the  old  estimate  and  the  new 
observation.  The  gradient  is  the  change  to  the  old  estimate.  In  many  instances,  the  true  error 
structure  is  not  known.  In  these  situations,  it  is  sometimes  possible  to  derive  performance  models 
from  domain  specific  knowledge  (usually  based  on  analytic  models  of  the  system).  These  models 
can  then  be  used  to  predict  the  error  structure  associated  with  a  specific  observation.  Under 
different  sets  of  assumptions,  there  are  analytic  and  heuristic  methods  for  finding  a  gradient. 

•  Example  4. 

This  example  demonstrates  the  method  for  recursive  estimation  of  a  location  in  the  x-y  plane. 
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(  \ 

■ 

Let  X,  = 

100 

,  and  E*  = 

10  5 

^100^ 

_5  30. 

The  correlation  between  successive  estimates  is  assumed  to  be  negligible. 


Let  the  new  observation  be  2,^, 


^95 

,  with  covariance  matrix  E,  * 

15  10 

10  15 

The  new  estimate  is: 


(  > 

“ 

100 

10 

5 

* 

100 

+ 

5 

30. 

(  \ 

( 

100 

10 

5 

m 

JOO^ 

+ 

.5 

30. 

10 

5 

15 

10 

-1 

(  \ 

95 

(  \ 

100 

5 

30. 

10 

15. 

. 

J10, 

.100, 

( 

N 

(  \ 

45 

-15 

1 

-5 

^-15  25, 

900 

JO, 

97.6389 
J  08.75 


\ 


11.  CONCLUSION 

Consider  again  the  estimation  problem  of  a  system  that  protects  a  tank  from  an  incoming 
projectile.  One  Markov  representation  of  the  projectile  is  its  trajectory  in  the  X,  Y,  and  Z 
dimensions.  If  there  is  more  than  one  sensor  estimating  the  location  of  the  projectile  at  a  given 
time  step,  then  these  should  be  combined  into  a  single  location  estimate.  If  it  is  acceptable  to 
assume  a  projectile  travels  at  a  constant  velocity  over  the  last  portion  of  its  path,  the  following 
three  equations  can  be  used  to  independently  estimate  the  projectile  trajectory  in  each  dimension. 

X  *  a,  +  Ojt 

Y  -  p,  .  p,t 

Z  “  Yi  ♦  Yat 
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Note  that  independence  means  parallel  computations  are  possible.  Each  of  the  above  can  be 
solved  using  recursive  least-squares  processing.  Improvements  in  the  estimation  are  made  by 
using  recursive  weighted  least-squares  estimation  where  the  weights  are  based  on  estimates  of 
the  error  in  each  dimension.  To  utilize  all  the  information  of  the  covariance  structure  of  each 
estimate  requires  "en  bloc"  type  updating  using  six  by  six  matrices  at  each  step  of  a  recursive 
weighted  least-squares  process.  After  evaluating  the  available  techniques,  a  final  decision  can 
be  based  on  concerns  of  computation  speed  and  acceptable  accuracy  of  the  estimator. 

The  wide  array  of  uses  for  least-squares  estimation  testifies  to  its  effectiveness.  The  key  to 
structuring  a  problem  for  a  least-squares  solution  is  finding  a  Markov  representation  of  the 
problem.  This  representation  defines  a  recursive  approach  to  estimation.  When  multiple 
estimates  are  available  at  each  time  step,  the  processing  time  can  be  decreased  by  using  data 
fusion  networks  to  reduce  the  information  to  a  single  estimate.  Hierarchical  networks  using  both 
parallel  and  serial  combination  of  data  can  be  devised.  The  cases  presented  herein  can  be  used 
to  preprocess  the  data  for  any  recursive  least-squares  method. 
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